NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

RCSB protein Data Bank: exploring protein 3D similarities via comprehensive structural alignments

https://doi.org/10.1093/bioinformatics/btae370

Bittrich, Sebastian; Segura, Joan; Duarte, Jose_M; Burley, Stephen_K; Rose, Yana; Elofsson, ed., Arne (June 2024, Bioinformatics)

Abstract MotivationTools for pairwise alignments between 3D structures of proteins are of fundamental importance for structural biology and bioinformatics, enabling visual exploration of evolutionary and functional relationships. However, the absence of a user-friendly, browser-based tool for creating alignments and visualizing them at both 1D sequence and 3D structural levels makes this process unnecessarily cumbersome. ResultsWe introduce a novel pairwise structure alignment tool (rcsb.org/alignment) that seamlessly integrates into the RCSB Protein Data Bank (RCSB PDB) research-focused RCSB.org web portal. Our tool and its underlying application programming interface (alignment.rcsb.org) empowers users to align several protein chains with a reference structure by providing access to established alignment algorithms (FATCAT, CE, TM-align, or Smith–Waterman 3D). The user-friendly interface simplifies parameter setup and input selection. Within seconds, our tool enables visualization of results in both sequence (1D) and structural (3D) perspectives through the RCSB PDB RCSB.org Sequence Annotations viewer and Mol* 3D viewer, respectively. Users can effortlessly compare structures deposited in the PDB archive alongside more than a million incorporated Computed Structure Models coming from the ModelArchive and AlphaFold DB. Moreover, this tool can be used to align custom structure data by providing a link/URL or uploading atomic coordinate files directly. Importantly, alignment results can be bookmarked and shared with collaborators. By bridging the gap between 1D sequence and 3D structures of proteins, our tool facilitates deeper understanding of complex evolutionary relationships among proteins through comprehensive sequence and structural analyses. Availability and implementationThe alignment tool is part of the RCSB PDB research-focused RCSB.org web portal and available at rcsb.org/alignment. Programmatic access is available via alignment.rcsb.org. Frontend code has been published at github.com/rcsb/rcsb-pecos-app. Visualization is powered by the open-source Mol* viewer (github.com/molstar/molstar and github.com/molstar/rcsb-molstar) plus the Sequence Annotations in 3D Viewer (github.com/rcsb/rcsb-saguaro-3d).
more » « less
pyCapsid: identifying dominant dynamics and quasi-rigid mechanical units in protein shells

https://doi.org/10.1093/bioinformatics/btad761

Brown, Colin; Agarwal, Anuradha; Luque, Antoni; Elofsson, ed., Arne (December 2023, Bioinformatics)

Abstract SummarypyCapsid is a Python package developed to facilitate the characterization of the dynamics and quasi-rigid mechanical units of protein shells and other protein complexes. The package was developed in response to the rapid increase of high-resolution structures, particularly capsids of viruses, requiring multiscale biophysical analyses. Given a protein shell, pyCapsid generates the collective vibrations of its amino-acid residues, identifies quasi-rigid mechanical regions associated with the disassembly of the structure, and maps the results back to the input proteins for interpretation. pyCapsid summarizes the main results in a report that includes publication-quality figures. Availability and implementationpyCapsid’s source code is available under MIT License on GitHub. It is compatible with Python 3.8–3.10 and has been deployed in two leading Python package-management systems, PIP and Conda. Installation instructions and tutorials are available in the online documentation and in the pyCapsid’s YouTube playlist. In addition, a cloud-based implementation of pyCapsid is available as a Google Colab notebook. pyCapsid Colab does not require installation and generates the same report and outputs as the installable version. Users can post issues regarding pyCapsid in the repository’s issues section.
more » « less
3DMolMS: prediction of tandem mass spectra from 3D molecular conformations

https://doi.org/10.1093/bioinformatics/btad354

Hong, Yuhui; Li, Sujun; Welch, Christopher J.; Tichy, Shane; Ye, Yuzhen; Tang, Haixu; Elofsson, ed., Arne (May 2023, Bioinformatics)

Abstract MotivationTandem mass spectrometry is an essential technology for characterizing chemical compounds at high sensitivity and throughput, and is commonly adopted in many fields. However, computational methods for automated compound identification from their MS/MS spectra are still limited, especially for novel compounds that have not been previously characterized. In recent years, in silico methods were proposed to predict the MS/MS spectra of compounds, which can then be used to expand the reference spectral libraries for compound identification. However, these methods did not consider the compounds’ 3D conformations, and thus neglected critical structural information. ResultsWe present the 3D Molecular Network for Mass Spectra Prediction (3DMolMS), a deep neural network model to predict the MS/MS spectra of compounds from their 3D conformations. We evaluated the model on the experimental spectra collected in several spectral libraries. The results showed that 3DMolMS predicted the spectra with the average cosine similarity of 0.691 and 0.478 with the experimental MS/MS spectra acquired in positive and negative ion modes, respectively. Furthermore, 3DMolMS model can be generalized to the prediction of MS/MS spectra acquired by different labs on different instruments through minor fine-tuning on a small set of spectra. Finally, we demonstrate that the molecular representation learned by 3DMolMS from MS/MS spectra prediction can be adapted to enhance the prediction of chemical properties such as the elution time in the liquid chromatography and the collisional cross section measured by ion mobility spectrometry, both of which are often used to improve compound identification. Availability and implementationThe codes of 3DMolMS are available at https://github.com/JosieHong/3DMolMS and the web service is at https://spectrumprediction.gnps2.org.
more » « less
EvoEF2: accurate and fast energy function for computational protein design

https://doi.org/10.1093/bioinformatics/btz740

Huang, Xiaoqiang; Pearce, ,. Robin; Zhang, Yang; Elofsson, ed., Arne (October 2019, Bioinformatics)

Abstract MotivationThe accuracy and success rate of de novo protein design remain limited, mainly due to the parameter over-fitting of current energy functions and their inability to discriminate incorrect designs from correct designs. ResultsWe developed an extended energy function, EvoEF2, for efficient de novo protein sequence design, based on a previously proposed physical energy function, EvoEF. Remarkably, EvoEF2 recovered 32.5%, 47.9% and 22.3% of all, core and surface residues for 148 test monomers, and was generally applicable to protein–protein interaction design, as it recapitulated 30.9%, 42.4%, 31.3% and 21.4% of all, core, interface and surface residues for 88 test dimers, significantly outperforming EvoEF on the native sequence recapitulation. We further used I-TASSER to evaluate the foldability of the 148 designed monomer sequences, where all of them were predicted to fold into structures with high fold- and atomic-level similarity to their corresponding native structures, as demonstrated by the fact that 87.8% of the predicted structures shared a root-mean-square-deviation less than 2 Å to their native counterparts. The study also demonstrated that the usefulness of physical energy functions is highly correlated with the parameter optimization processes, and EvoEF2, with parameters optimized using sequence recapitulation, is more suitable for computational protein sequence design than EvoEF, which was optimized on thermodynamic mutation data. Availability and implementationThe source code of EvoEF2 and the benchmark datasets are freely available at https://zhanglab.ccmb.med.umich.edu/EvoEF. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning

https://doi.org/10.1093/bioinformatics/btz276

Ni, Peng; Huang, Neng; Zhang, Zhi; Wang, De-Peng; Liang, Fan; Miao, Yu; Xiao, Chuan-Le; Luo, Feng; Wang, Jianxin; Elofsson, ed., Arne (April 2019, Bioinformatics)

Search for: All records